Collaborative filtering¶
A distinction is often made between two forms of data collection for recommendation systems. Explicit feedback relies on the user giving explicit signals about their preferences i.e. review ratings. Where as, implicit feedback refers to non-explicit signals of preference e.g. user watch-time. Traditionally, recommender systems can be split into three types:
Collaborative filtering (CF): CF produces recommendations based on the knowledge of users’ attitudes towards items, that is, it uses the “wisdom of the crowd” to recommend items.
Content-based (CB): CB recommender systems focus on the attributes of the items to recommend other items similar to what the user likes, based on their previous actions or explicit feedback.
Hybrid recommendation systems: Hybrid methods are a combination of CB recommending and CF methods
In many applications, content-based features are not easy to extract, and thus, collaborative filtering approaches are preferred. Thus, we will only explore collaborative filtering methods from now on.
CF methods typically fall into three types, memory-based, model-based and more recently deep-learning based (Su & Khoshgoftaar, 2009, He et al., 2017). Neighbour-based CF and item-based/user-based top-N recommendations are typical examples of memory-based systems that utilises user rating data to compute the similarity between users or items. As mentioned previously, common model-based approaches include Bayesian networks, latent semantic models and markov decision processes. In this investigation, we will utilise a weighted matrix factorization approach. Later on, we will generalize the matrix factorization algorithm via a non-linear neural architecture (a softmax model).
However, there are a number of limitations to our approaches such as the inability to model the order of interactions. For instance, Markov chain algorithms (Rendle et al., 2010) can not only encode the same information as traditional CF methods but also the order in which user’s interacted with the items. Furthermore, the sparsity of the frequency matrix (described later on), makes computations prohibitly expensive in real-world settings, without some optimization.
Quick Links:¶
Setup¶
The next few code cells details the initial preparatory steps needed for the development of our collaborative filtering models, namely importing the required libraries; scaling the ids of users and artists;constructing a indicator variable for presence of user-artist interaction;finding the most assigned tag of an artist.
from __future__ import print_function
import numpy as np
import pandas as pd
import collections
from IPython import display
from matplotlib import pyplot as plt
import sklearn
import sklearn.manifold
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
tf.logging.set_verbosity(tf.logging.ERROR)
# Add some convenience functions to Pandas DataFrame.
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.3f}'.format
# Install Altair and activate its colab renderer.
print("Installing Altair...")
!pip install git+git://github.com/altair-viz/altair.git
import altair as alt
alt.data_transformers.enable('default', max_rows=None)
alt.renderers.enable('colab')
print("Done installing Altair.")
2021-11-28 16:50:11.340001: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-28 16:50:11.340047: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:111: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Installing Altair...
Collecting git+git://github.com/altair-viz/altair.git
Cloning git://github.com/altair-viz/altair.git to /tmp/pip-req-build-9au_0mgx
Running command git clone --filter=blob:none -q git://github.com/altair-viz/altair.git /tmp/pip-req-build-9au_0mgx
Resolved git://github.com/altair-viz/altair.git to commit a987d04e276106f62d4247ea48a1fcead2d06636
Installing build dependencies ... ?25l-
\
|
/
done
?25h Getting requirements to build wheel ... ?25l-
done
?25h Preparing metadata (pyproject.toml) ... ?25l-
done
?25hRequirement already satisfied: jsonschema<4.0,>=3.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (3.2.0)
Requirement already satisfied: pandas>=0.18 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (1.3.4)
Requirement already satisfied: entrypoints in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (0.3)
Collecting toolz
Downloading toolz-0.11.2-py3-none-any.whl (55 kB)
?25l
|█████▉ | 10 kB 29.6 MB/s eta 0:00:01
|███████████▊ | 20 kB 23.6 MB/s eta 0:00:01
|█████████████████▋ | 30 kB 17.4 MB/s eta 0:00:01
|███████████████████████▌ | 40 kB 7.0 MB/s eta 0:00:01
|█████████████████████████████▍ | 51 kB 8.2 MB/s eta 0:00:01
|████████████████████████████████| 55 kB 5.7 MB/s
?25hRequirement already satisfied: numpy in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (1.21.4)
Requirement already satisfied: jinja2 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (3.0.3)
Requirement already satisfied: pyrsistent>=0.14.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (0.18.0)
Requirement already satisfied: setuptools in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (47.1.0)
Requirement already satisfied: importlib-metadata in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (4.8.2)
Requirement already satisfied: six>=1.11.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (1.16.0)
Requirement already satisfied: attrs>=17.4.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (21.2.0)
Requirement already satisfied: pytz>=2017.3 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from pandas>=0.18->altair==4.2.0.dev0) (2021.3)
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from pandas>=0.18->altair==4.2.0.dev0) (2.8.2)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jinja2->altair==4.2.0.dev0) (2.0.1)
Requirement already satisfied: zipp>=0.5 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from importlib-metadata->jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (3.6.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from importlib-metadata->jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (4.0.0)
Building wheels for collected packages: altair
Building wheel for altair (pyproject.toml) ... ?25l-
\
|
done
?25h Created wheel for altair: filename=altair-4.2.0.dev0-py3-none-any.whl size=812168 sha256=aba552135028ce108d5aecb0265efd4c9128b68ad5a47bbaeb9740ae142fcda7
Stored in directory: /tmp/pip-ephem-wheel-cache-j3ilv0e3/wheels/06/13/e0/5bd72c969fe3954ee1561739e5c58e2ddfe5c10fcdffb12faa
Successfully built altair
Installing collected packages: toolz, altair
Successfully installed altair-4.2.0.dev0 toolz-0.11.2
Done installing Altair.
# NEEDED FOR GOOGLE COLAB
# from google.colab import auth
#from google.colab import drive
# import gspread
# from oauth2client.client import GoogleCredentials
# drive.mount('/content/drive/')
# os.chdir("/content/drive/My Drive/DCU/fouth_year/advanced_machine_learning/music-recommodation-system")
Helper functions
def calculate_sparsity(M):
"""
Computes sparsity of frequency matrix
"""
matrix_size = len((M['userID'].unique())) * len((M['artistID'].unique())) # Number of possible interactions in the matrix
num_plays = len(M['weight']) # Number of weights
sparsity = (float(num_plays/matrix_size))
return sparsity
def build_music_sparse_tensor(music_df):
"""
Args:
ratings_df: a pd.DataFrame with `userID`, `artistID` and `weight` columns.
num_rows: an integer representing the number of rows in the frequency matrix
num_rows: an integer representing the number of columns in the frequency matrix
Returns:
a tf.SparseTensor representing the feedback matrix.
"""
indices = music_df[['userID', 'artistID']].values
values = music_df['weight'].values
return tf.SparseTensor(
indices=indices,
values=values,
dense_shape=[num_users, num_artist])
def preproces_ids(music_df):
"""
Args:
ratings_df: a pd.DataFrame with `userID`, `artistID` and `weight` columns.
Returns:
a pd.DataFrame where userIDs and artistIDs now start at 1
and end at n and m (defined above), respectively
two dictionary preserving the orginal ids.
"""
unique_user_ids_list = sorted(music_df['userID'].unique())
print(unique_user_ids_list[0])
unique_user_ids = dict(zip(range(0, len(unique_user_ids_list) ),unique_user_ids_list))
unique_user_ids_switched = dict(zip(unique_user_ids_list, range(0, len(unique_user_ids) )))
unique_artist_ids_list = sorted(music_df['artistID'].unique())
unique_artist_ids = dict(zip(range(0, len(unique_artist_ids_list) ),unique_artist_ids_list))
unique_artist_ids_switched = dict(zip(unique_artist_ids_list, range(0, len(unique_artist_ids_list) )))
music_df['userID'] = music_df['userID'].map(unique_user_ids_switched)
music_df['artistID'] = music_df['artistID'].map(unique_artist_ids_switched)
return music_df, unique_user_ids, unique_artist_ids
def split_dataframe(df, holdout_fraction=0.1):
"""Splits a DataFrame into training and test sets.
Args:
df: a dataframe.
holdout_fraction: fraction of dataframe rows to use in the test set.
Returns:
train: dataframe for training
test: dataframe for testing
"""
test = df.sample(frac=holdout_fraction, replace=False)
train = df[~df.index.isin(test.index)]
return train, test
Traditional recommender system development relies on explicit feedback. Many models were designed to tackle this issue as a regression problem. For instance, the input of the model would be a matrix \(F_{nm}\) denoting user’s (m) preference of items (n) on a scale. In the classic movie ratings example, this preference would be users giving a 1-to-5 star rating to different movies.
This dataset contains implicit feedback: that is, observed logs of user interactions with items, in this instance user’s listening counts to artists. However, implicit feedback does not signal negativity, in the same way as a 1-star rating would. In our data, a user could listen to song of an artist a limited number of times. But that does not necessarily mean that the particular user has an aversion to that artist i.e. it could be part of a curated playlist by another user. Therefore, we decide to construct a binary matrix, which has a value of one if the observation is observed (i.e. a listening count has been logged between an artist and a user). Note, a 0 is not used to describe unobserved artist-user interactions. This is for optimization reasons, explained below.
user_artists = pd.read_csv('data/user_artists.dat', sep='\t')
user_artists['weight'] = 1
artists = pd.read_csv('data/artists.dat', sep='\t')
artists.rename({'id':'artistID'}, inplace=True, axis=1)
user_taggedartists = pd.read_csv(r'data/user_taggedartists-timestamps.dat', sep='\t')
user_taggedartists_years = pd.read_csv(r'data/user_taggedartists.dat', sep='\t')
tags = pd.read_csv(open('data/tags.dat', errors='replace'), sep='\t')
user_taggedartists = pd.merge(user_taggedartists, tags, on=['tagID'])
num_users = user_artists.userID.nunique()
num_artist = artists.artistID.nunique()
collab_filter_df = user_artists
Here, we calculate the top 10 tags by popularity. Then, we assign it to a artist, if the artist has a top 10 tag. If an artist’s tags are not in the top 10, we input ‘N/A’. Note, the next cell can take several mintues to compute.
top_10_tags = user_taggedartists['tagValue'].value_counts().index[0:10]
user_taggedartists['top10TagValue'] = None
for index, row in user_taggedartists.iterrows():
if row['tagValue'] in top_10_tags:
user_taggedartists.iloc[index, -1] = row['tagValue']
user_taggedartists.fillna('N/A',inplace=True)
artists = pd.merge(user_taggedartists, artists, on=['artistID'], how='right')[['artistID','name','top10TagValue','tagValue']].fillna('N/A')
artists.groupby(['artistID','name','top10TagValue']).agg(lambda x:x.value_counts().index[0]).reset_index()
artists = artists.drop_duplicates(subset=['artistID'])
assert artists.artistID.nunique() == num_artist
artists.rename({'tagValue':'mostCommonGenre'},axis=1, inplace=True)
We require two matrices or embeddings to compute a similarity measure (one for quires and one for items), but how do we get these two embeddings?
Matrix Factorisation¶
Figure 2: Data flow chart
First, we need to contsruct the feedback matrix \(F \in R^{m \times n}\), where \(m\) is the number of users and \(n\) is the number of artists. The goal is to two generate two lower-dimensional matrices \(U_{mp}\) and \(V_{np}\) ( with \(p << m\) and \(p << n\)), representing latent user and artist components, so that: $\( F \approx UV^\top \)$
First,we attempt to build the frequency matrix for both training and testing data. tf.SparseTensor is used
for efficient representation. Three separate arguments are used to represent a tensor, namely indices, values, dense_shape, where a value \(A_{ij} = a\) is encoded by setting indices[k] = [i, j] and values[k] = a. The last tensor dense_shape is used to specify the shape of the full underlying matrix. Note, as the indices arguments represent row and columns indices, some pre-processing needs to be performed on artist and user IDs. The IDs should start from 0 and end at \(m-1\) and \(n-1\) for users and artists respectively. Presently, userIDs start at 2. Two dictionaries, orginal_artist_ids, orginal_user_ids will preserve the original ids for analysis purposes later on. Assertions and print statements are used to ensure the validity of the transformations.
colab_filter_df, orginal_user_ids, orginal_artist_ids = preproces_ids(collab_filter_df)
2
colab_filter_df.describe()
| userID | artistID | weight | |
|---|---|---|---|
| count | 92834.000 | 92834.000 | 92834.000 |
| mean | 944.222 | 3235.737 | 1.000 |
| std | 546.751 | 4197.217 | 0.000 |
| min | 0.000 | 0.000 | 1.000 |
| 25% | 470.000 | 430.000 | 1.000 |
| 50% | 944.000 | 1237.000 | 1.000 |
| 75% | 1416.000 | 4266.000 | 1.000 |
| max | 1891.000 | 17631.000 | 1.000 |
Next, we caulcate the number of unique artists, userids and sparisty of our proposed frequency matrix, before splitting into training and test subsets. Quite a sparse matrix indeed!
print(f'Number of unqiue users are: {collab_filter_df["userID"].nunique()}')
print(f'Number of unqiue artists are: {collab_filter_df["artistID"].nunique()}')
print(f'Sparsity of our frequency matrix: {calculate_sparsity(collab_filter_df)}')
Number of unqiue users are: 1892
Number of unqiue artists are: 17632
Sparsity of our frequency matrix: 0.002782815119924182
collab_filter_df.to_csv('data/test_user_artists.csv',index=False)
frequency_m_train, frequency_m_test = split_dataframe(colab_filter_df)
frequency_m_train_tensor = build_music_sparse_tensor(frequency_m_train)
frequency_m_test_tensor = build_music_sparse_tensor(frequency_m_test)
assert num_users == frequency_m_train_tensor.shape.as_list()[0]
assert num_artist == frequency_m_train_tensor.shape.as_list()[1]
assert num_users == frequency_m_test_tensor.shape.as_list()[0]
assert num_artist == frequency_m_test_tensor.shape.as_list()[1]
Training a Matrix factorization model¶
Per the definition above, \(UV^\top\) approximates \(F\). The Mean Squared Error is used to measure this approximation error. In the notation below, k is used to represent the set of observed listening counts, and K is the number of observed listening counts.
However, rather than computing the full prediction matrix, \(UV^\top\) and gathering the entries in the embeddings (corresponding to the observed listening counts) , we only gather the embeddings of the observers pairs and compute their dot products. Thereby, we reduce the complexity from \(O(NM)\) to \(O(Kp)\) where \(p\) is the embedding dimension. Stochastic gradient descent (SGD) is used to minimize the loss (objective) function. The SDG algorithim cycles through the observed listening binary and caulates the prediction according to the following equation.
Then it updates the user and artist as embeddings as shown in the following equations.
where \(\alpha\) denotes the learning rate. The algorithim continues untill convergence is found.
Other matrix factorization algorithms functions are also commonly used such as Alternating Least Squares (Takács and Tikk, 2012). A modified version of the aforementioned function known as Weighted Alternating Least Squares (WALS) is slower than SDG but can be parallelised. For the purposes of this investigation, we are not particularly concerned with training times/latency requirements so we proceed with SDG.
We also decide to add regularization to our model, to avoid overfitting. Overfitting occurs when the model tries to fit the training dataset to hard and does not generalize well to unseen or future data. In the context of artist recommendation, fitting the observed listening counts often emphasizes learning high similarity (between artists with many listeners), but a good embedding representation also requires learning low similarity (between artists with few listeners).
First, we define the two classes train_matrix_norm and build_matrix_norm class. The build_matrix_norm class computes the necessary pre-processing steps before we train the model such as specifying the loss metric to optimise and the loss components( e.g. gravity loss for the regularized model) and the initial artist and user embeddings. train_matrix_norm simply trains the models and outputs figures detailing the the loss metrics and components. The methods build_vanilla() and build_reg_model() computes the necessary pre-processing steps for the non-regularized and regularized model.
### Training a Matrix Factorization model
class train_matrix_norm(object):
"""Simple class that represents a matrix normalisation model"""
def __init__(self, embedding_vars, loss, metrics=None):
"""Initializes a Matrix normalisation model
Args:
embedding_vars: A dictionary of tf.Variables.
loss: A float Tensor. The loss to optimize.
metrics: optional list of dictionaries of Tensors. The metrics in each
dictionary will be plotted in a separate figure during training.
"""
self._embedding_vars = embedding_vars
self._loss = loss
self._metrics = metrics
self._embeddings = {k: None for k in embedding_vars}
self._session = None
@property
def embeddings(self):
"""The embeddings dictionary."""
return self._embeddings
def train(self, num_iterations=100, learning_rate=1.0, plot_results=True,
optimizer=tf.train.GradientDescentOptimizer):
"""Trains the model.
Args:
iterations: number of iterations to run.
learning_rate: optimizer learning rate.
plot_results: whether to plot the results at the end of training.
optimizer: the optimizer to use. Default to SDG
Returns:
The metrics dictionary evaluated at the last iteration.
"""
with self._loss.graph.as_default():
opt = optimizer(learning_rate)
train_op = opt.minimize(self._loss)
local_init_op = tf.group(
tf.variables_initializer(opt.variables()),
tf.local_variables_initializer())
if self._session is None:
self._session = tf.Session()
with self._session.as_default():
self._session.run(tf.global_variables_initializer())
self._session.run(tf.tables_initializer())
tf.train.start_queue_runners()
with self._session.as_default():
local_init_op.run()
iterations = []
metrics = self._metrics or ({},)
metrics_vals = [collections.defaultdict(list) for _ in self._metrics]
# Train and append results.
for i in range(num_iterations + 1):
_, results = self._session.run((train_op, metrics))
if (i % 10 == 0) or i == num_iterations:
print("\r iteration %d: " % i + ", ".join(
["%s=%f" % (k, v) for r in results for k, v in r.items()]),
end='')
iterations.append(i)
for metric_val, result in zip(metrics_vals, results):
for k, v in result.items():
metric_val[k].append(v)
for k, v in self._embedding_vars.items():
self._embeddings[k] = v.eval()
if plot_results:
# Plot the metrics.
num_subplots = len(metrics)+1
fig = plt.figure()
fig.set_size_inches(num_subplots*10, 8)
for i, metric_vals in enumerate(metrics_vals):
ax = fig.add_subplot(1, num_subplots, i+1)
for k, v in metric_vals.items():
ax.plot(iterations, v, label=k)
ax.set_xlim([1, num_iterations])
ax.legend()
return results
class build_matrix_norm():
"""Simple class that represents a matrix normalisation model"""
def __init__(self, listens, embedding_dim=3, regularization_coeff=.1, gravity_coeff=1.,
init_stddev=0.1):
"""Initializes a Matrix normalisation model
Args:
listens: the DataFrame of artist listening counts.
embedding_dim: The dimension of the embedding space.
regularization_coeff: The regularization coefficient lambda.
gravity_coeff: The gravity regularization coefficient lambda_g.
Returns:
A train_matrix_norm object that uses a regularized loss.
"""
self._embedding_vars = embedding_vars
self._loss = loss
self._metrics = metrics
self._embeddings = {k: None for k in embedding_vars}
self._session = None
def sparse_mean_square_error(sparse_listens, user_embeddings, artist_embeddings):
"""
Args:
sparse_listens: A SparseTensor rating matrix, of dense_shape [N, M]
user_embeddings: A dense Tensor U of shape [N, k] where k is the embedding
dimension, such that U_i is the embedding of user i.
artist_embeddings: A dense Tensor V of shape [M, k] where k is the embedding
dimension, such that V_j is the embedding of movie j.
Returns:
A scalar Tensor representing the MSE between the true ratings and the
model's predictions.
"""
predictions = tf.gather_nd(
tf.matmul(user_embeddings, artist_embeddings, transpose_b=True),
sparse_listens.indices)
loss = tf.losses.mean_squared_error(sparse_listens.values, predictions)
return loss
def gravity(U, V):
"""Creates a gravity loss given two embedding matrices."""
return 1. / (U.shape[0].value*V.shape[0].value) * tf.reduce_sum(
tf.matmul(U, U, transpose_a=True) * tf.matmul(V, V, transpose_a=True))
def build_vanilla(embedding_dim=3, init_stddev=1.):
"""performs the necessary preprocessing steps for the regularized model. """
# Initialize the embeddings using a normal distribution.
U = tf.Variable(tf.random.normal(
[frequency_m_train_tensor.dense_shape[0], embedding_dim], stddev=init_stddev))
V = tf.Variable(tf.random.normal(
[frequency_m_train_tensor.dense_shape[1], embedding_dim], stddev=init_stddev))
embeddings = {"userID": U, "artistID": V}
error_train = build_matrix_norm.sparse_mean_square_error(frequency_m_train_tensor, U, V)
error_test = build_matrix_norm.sparse_mean_square_error(frequency_m_test_tensor, U, V)
metrics = {
'train_error': error_train,
'test_error': error_test
}
return train_matrix_norm(embeddings, error_train, [metrics])
def build_reg_model(embedding_dim=3, regularization_coeff=.1, gravity_coeff=1.,
init_stddev=0.1
):
"""performs the necessary preprocessing steps for the regularized model. """
U = tf.Variable(tf.random.normal(
[frequency_m_train_tensor.dense_shape[0], embedding_dim], stddev=init_stddev))
V = tf.Variable(tf.random.normal(
[frequency_m_train_tensor.dense_shape[1], embedding_dim], stddev=init_stddev))
embeddings = {"userID": U, "artistID": V}
error_train = build_matrix_norm.sparse_mean_square_error(frequency_m_train_tensor, U, V)
error_test = build_matrix_norm.sparse_mean_square_error(frequency_m_test_tensor, U, V)
gravity_loss = gravity_coeff * build_matrix_norm.gravity(U, V)
regularization_loss = regularization_coeff * (
tf.reduce_sum(U*U)/U.shape[0].value + tf.reduce_sum(V*V)/V.shape[0].value)
total_loss = error_train + regularization_loss + gravity_loss
losses = {
'train_error_observed': error_train,
'test_error_observed': error_test,
}
loss_components = {
'observed_loss': error_train,
'regularization_loss': regularization_loss,
'gravity_loss': gravity_loss,
}
#embeddings = {"userID": U, "artistID": V}
return train_matrix_norm(embeddings, total_loss, [losses, loss_components])
Vanilla Model (non-regularized)¶
vanilla_model = build_matrix_norm.build_vanilla(embedding_dim=35,init_stddev=.05)
vanilla_model.train(num_iterations=2000, learning_rate=20.)
2021-11-28 16:52:15.861606: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-11-28 16:52:15.861646: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2021-11-28 16:52:15.861673: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az83-233): /proc/driver/nvidia/version does not exist
2021-11-28 16:52:15.861942: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
iteration 0: train_error=1.000227, test_error=1.000253
iteration 10: train_error=0.998526, test_error=1.000226
iteration 20: train_error=0.996735, test_error=1.000124
iteration 30: train_error=0.994730, test_error=0.999840
iteration 40: train_error=0.992304, test_error=0.999192
iteration 50: train_error=0.989093, test_error=0.997844
iteration 60: train_error=0.984439, test_error=0.995181
iteration 70: train_error=0.977182, test_error=0.990101
iteration 80: train_error=0.965441, test_error=0.980783
iteration 90: train_error=0.946653, test_error=0.964688
iteration 100: train_error=0.918552, test_error=0.939434
iteration 110: train_error=0.881324, test_error=0.904881
iteration 120: train_error=0.838524, test_error=0.864285
iteration 130: train_error=0.794060, test_error=0.821610
iteration 140: train_error=0.749451, test_error=0.778546
iteration 150: train_error=0.705176, test_error=0.735538
iteration 160: train_error=0.662150, test_error=0.693428
iteration 170: train_error=0.621414, test_error=0.653329
iteration 180: train_error=0.583612, test_error=0.616041
iteration 190: train_error=0.548947, test_error=0.581904
iteration 200: train_error=0.517337, test_error=0.550920
iteration 210: train_error=0.488553, test_error=0.522895
iteration 220: train_error=0.462309, test_error=0.497551
iteration 230: train_error=0.438313, test_error=0.474591
iteration 240: train_error=0.416298, test_error=0.453736
iteration 250: train_error=0.396026, test_error=0.434737
iteration 260: train_error=0.377290, test_error=0.417378
iteration 270: train_error=0.359911, test_error=0.401471
iteration 280: train_error=0.343738, test_error=0.386853
iteration 290: train_error=0.328637, test_error=0.373384
iteration 300: train_error=0.314496, test_error=0.360940
iteration 310: train_error=0.301218, test_error=0.349415
iteration 320: train_error=0.288720, test_error=0.338716
iteration 330: train_error=0.276931, test_error=0.328762
iteration 340: train_error=0.265788, test_error=0.319483
iteration 350: train_error=0.255239, test_error=0.310817
iteration 360: train_error=0.245237, test_error=0.302710
iteration 370: train_error=0.235739, test_error=0.295116
iteration 380: train_error=0.226710, test_error=0.287991
iteration 390: train_error=0.218114, test_error=0.281299
iteration 400: train_error=0.209923, test_error=0.275006
iteration 410: train_error=0.202108, test_error=0.269081
iteration 420: train_error=0.194645, test_error=0.263498
iteration 430: train_error=0.187509, test_error=0.258230
iteration 440: train_error=0.180679, test_error=0.253255
iteration 450: train_error=0.174136, test_error=0.248552
iteration 460: train_error=0.167861, test_error=0.244101
iteration 470: train_error=0.161839, test_error=0.239885
iteration 480: train_error=0.156052, test_error=0.235888
iteration 490: train_error=0.150488, test_error=0.232094
iteration 500: train_error=0.145133, test_error=0.228489
iteration 510: train_error=0.139977, test_error=0.225062
iteration 520: train_error=0.135007, test_error=0.221801
iteration 530: train_error=0.130215, test_error=0.218694
iteration 540: train_error=0.125591, test_error=0.215732
iteration 550: train_error=0.121127, test_error=0.212907
iteration 560: train_error=0.116817, test_error=0.210210
iteration 570: train_error=0.112652, test_error=0.207633
iteration 580: train_error=0.108628, test_error=0.205169
iteration 590: train_error=0.104738, test_error=0.202813
iteration 600: train_error=0.100977, test_error=0.200557
iteration 610: train_error=0.097341, test_error=0.198396
iteration 620: train_error=0.093825, test_error=0.196326
iteration 630: train_error=0.090425, test_error=0.194341
iteration 640: train_error=0.087137, test_error=0.192437
iteration 650: train_error=0.083958, test_error=0.190610
iteration 660: train_error=0.080884, test_error=0.188856
iteration 670: train_error=0.077913, test_error=0.187172
iteration 680: train_error=0.075042, test_error=0.185554
iteration 690: train_error=0.072267, test_error=0.183999
iteration 700: train_error=0.069587, test_error=0.182503
iteration 710: train_error=0.066998, test_error=0.181066
iteration 720: train_error=0.064499, test_error=0.179683
iteration 730: train_error=0.062086, test_error=0.178352
iteration 740: train_error=0.059758, test_error=0.177071
iteration 750: train_error=0.057513, test_error=0.175839
iteration 760: train_error=0.055348, test_error=0.174652
iteration 770: train_error=0.053261, test_error=0.173508
iteration 780: train_error=0.051250, test_error=0.172407
iteration 790: train_error=0.049313, test_error=0.171346
iteration 800: train_error=0.047447, test_error=0.170324
iteration 810: train_error=0.045651, test_error=0.169339
iteration 820: train_error=0.043923, test_error=0.168389
iteration 830: train_error=0.042260, test_error=0.167473
iteration 840: train_error=0.040661, test_error=0.166590
iteration 850: train_error=0.039123, test_error=0.165739
iteration 860: train_error=0.037645, test_error=0.164918
iteration 870: train_error=0.036225, test_error=0.164126
iteration 880: train_error=0.034860, test_error=0.163361
iteration 890: train_error=0.033549, test_error=0.162624
iteration 900: train_error=0.032289, test_error=0.161912
iteration 910: train_error=0.031080, test_error=0.161225
iteration 920: train_error=0.029919, test_error=0.160562
iteration 930: train_error=0.028804, test_error=0.159922
iteration 940: train_error=0.027733, test_error=0.159304
iteration 950: train_error=0.026706, test_error=0.158707
iteration 960: train_error=0.025719, test_error=0.158130
iteration 970: train_error=0.024772, test_error=0.157574
iteration 980: train_error=0.023863, test_error=0.157036
iteration 990: train_error=0.022991, test_error=0.156516
iteration 1000: train_error=0.022153, test_error=0.156014
iteration 1010: train_error=0.021349, test_error=0.155528
iteration 1020: train_error=0.020578, test_error=0.155059
iteration 1030: train_error=0.019837, test_error=0.154605
iteration 1040: train_error=0.019125, test_error=0.154167
iteration 1050: train_error=0.018442, test_error=0.153743
iteration 1060: train_error=0.017786, test_error=0.153332
iteration 1070: train_error=0.017156, test_error=0.152936
iteration 1080: train_error=0.016551, test_error=0.152552
iteration 1090: train_error=0.015969, test_error=0.152181
iteration 1100: train_error=0.015411, test_error=0.151821
iteration 1110: train_error=0.014874, test_error=0.151474
iteration 1120: train_error=0.014358, test_error=0.151137
iteration 1130: train_error=0.013862, test_error=0.150811
iteration 1140: train_error=0.013386, test_error=0.150495
iteration 1150: train_error=0.012928, test_error=0.150190
iteration 1160: train_error=0.012487, test_error=0.149894
iteration 1170: train_error=0.012064, test_error=0.149607
iteration 1180: train_error=0.011656, test_error=0.149330
iteration 1190: train_error=0.011265, test_error=0.149061
iteration 1200: train_error=0.010888, test_error=0.148800
iteration 1210: train_error=0.010525, test_error=0.148547
iteration 1220: train_error=0.010176, test_error=0.148302
iteration 1230: train_error=0.009840, test_error=0.148065
iteration 1240: train_error=0.009517, test_error=0.147834
iteration 1250: train_error=0.009206, test_error=0.147611
iteration 1260: train_error=0.008906, test_error=0.147395
iteration 1270: train_error=0.008618, test_error=0.147184
iteration 1280: train_error=0.008340, test_error=0.146981
iteration 1290: train_error=0.008073, test_error=0.146783
iteration 1300: train_error=0.007815, test_error=0.146591
iteration 1310: train_error=0.007567, test_error=0.146404
iteration 1320: train_error=0.007328, test_error=0.146224
iteration 1330: train_error=0.007097, test_error=0.146048
iteration 1340: train_error=0.006875, test_error=0.145877
iteration 1350: train_error=0.006661, test_error=0.145712
iteration 1360: train_error=0.006455, test_error=0.145551
iteration 1370: train_error=0.006256, test_error=0.145395
iteration 1380: train_error=0.006064, test_error=0.145243
iteration 1390: train_error=0.005879, test_error=0.145095
iteration 1400: train_error=0.005701, test_error=0.144952
iteration 1410: train_error=0.005529, test_error=0.144813
iteration 1420: train_error=0.005363, test_error=0.144677
iteration 1430: train_error=0.005203, test_error=0.144545
iteration 1440: train_error=0.005049, test_error=0.144417
iteration 1450: train_error=0.004900, test_error=0.144293
iteration 1460: train_error=0.004756, test_error=0.144171
iteration 1470: train_error=0.004618, test_error=0.144054
iteration 1480: train_error=0.004484, test_error=0.143939
iteration 1490: train_error=0.004355, test_error=0.143827
iteration 1500: train_error=0.004230, test_error=0.143718
iteration 1510: train_error=0.004110, test_error=0.143613
iteration 1520: train_error=0.003993, test_error=0.143510
iteration 1530: train_error=0.003881, test_error=0.143409
iteration 1540: train_error=0.003773, test_error=0.143312
iteration 1550: train_error=0.003668, test_error=0.143217
iteration 1560: train_error=0.003567, test_error=0.143124
iteration 1570: train_error=0.003469, test_error=0.143034
iteration 1580: train_error=0.003374, test_error=0.142946
iteration 1590: train_error=0.003283, test_error=0.142860
iteration 1600: train_error=0.003195, test_error=0.142776
iteration 1610: train_error=0.003110, test_error=0.142695
iteration 1620: train_error=0.003027, test_error=0.142616
iteration 1630: train_error=0.002948, test_error=0.142538
iteration 1640: train_error=0.002871, test_error=0.142463
iteration 1650: train_error=0.002796, test_error=0.142389
iteration 1660: train_error=0.002724, test_error=0.142317
iteration 1670: train_error=0.002654, test_error=0.142247
iteration 1680: train_error=0.002587, test_error=0.142179
iteration 1690: train_error=0.002522, test_error=0.142112
iteration 1700: train_error=0.002459, test_error=0.142047
iteration 1710: train_error=0.002398, test_error=0.141983
iteration 1720: train_error=0.002338, test_error=0.141921
iteration 1730: train_error=0.002281, test_error=0.141860
iteration 1740: train_error=0.002226, test_error=0.141801
iteration 1750: train_error=0.002172, test_error=0.141743
iteration 1760: train_error=0.002120, test_error=0.141687
iteration 1770: train_error=0.002070, test_error=0.141632
iteration 1780: train_error=0.002022, test_error=0.141578
iteration 1790: train_error=0.001974, test_error=0.141525
iteration 1800: train_error=0.001929, test_error=0.141473
iteration 1810: train_error=0.001885, test_error=0.141423
iteration 1820: train_error=0.001842, test_error=0.141374
iteration 1830: train_error=0.001800, test_error=0.141326
iteration 1840: train_error=0.001760, test_error=0.141278
iteration 1850: train_error=0.001721, test_error=0.141232
iteration 1860: train_error=0.001683, test_error=0.141187
iteration 1870: train_error=0.001646, test_error=0.141143
iteration 1880: train_error=0.001611, test_error=0.141100
iteration 1890: train_error=0.001576, test_error=0.141058
iteration 1900: train_error=0.001543, test_error=0.141017
iteration 1910: train_error=0.001510, test_error=0.140976
iteration 1920: train_error=0.001479, test_error=0.140937
iteration 1930: train_error=0.001448, test_error=0.140898
iteration 1940: train_error=0.001419, test_error=0.140860
iteration 1950: train_error=0.001390, test_error=0.140823
iteration 1960: train_error=0.001362, test_error=0.140786
iteration 1970: train_error=0.001335, test_error=0.140750
iteration 1980: train_error=0.001309, test_error=0.140716
iteration 1990: train_error=0.001283, test_error=0.140681
iteration 2000: train_error=0.001258, test_error=0.140648
[{'train_error': 0.001258304, 'test_error': 0.14064766}]
Regularized moodel¶
reg_model = build_matrix_norm.build_reg_model(regularization_coeff=0.1, gravity_coeff=1.0, embedding_dim=35,init_stddev=.05)
reg_model.train(num_iterations=2000, learning_rate=20.)
iteration 0: train_error_observed=1.000023, test_error_observed=0.999826, observed_loss=1.000023, regularization_loss=0.017468, gravity_loss=0.000218
iteration 10: train_error_observed=0.998364, test_error_observed=0.999792, observed_loss=0.998364, regularization_loss=0.017070, gravity_loss=0.000208
iteration 20: train_error_observed=0.996703, test_error_observed=0.999693, observed_loss=0.996703, regularization_loss=0.016727, gravity_loss=0.000200
iteration 30: train_error_observed=0.994921, test_error_observed=0.999432, observed_loss=0.994921, regularization_loss=0.016436, gravity_loss=0.000192
iteration 40: train_error_observed=0.992834, test_error_observed=0.998848, observed_loss=0.992834, regularization_loss=0.016200, gravity_loss=0.000187
iteration 50: train_error_observed=0.990130, test_error_observed=0.997656, observed_loss=0.990130, regularization_loss=0.016027, gravity_loss=0.000183
iteration 60: train_error_observed=0.986250, test_error_observed=0.995338, observed_loss=0.986250, regularization_loss=0.015933, gravity_loss=0.000180
iteration 70: train_error_observed=0.980224, test_error_observed=0.990973, observed_loss=0.980224, regularization_loss=0.015951, gravity_loss=0.000181
iteration 80: train_error_observed=0.970457, test_error_observed=0.983033, observed_loss=0.970457, regularization_loss=0.016139, gravity_loss=0.000188
iteration 90: train_error_observed=0.954685, test_error_observed=0.969305, observed_loss=0.954685, regularization_loss=0.016587, gravity_loss=0.000206
iteration 100: train_error_observed=0.930580, test_error_observed=0.947421, observed_loss=0.930580, regularization_loss=0.017418, gravity_loss=0.000247
iteration 110: train_error_observed=0.897485, test_error_observed=0.916504, observed_loss=0.897485, regularization_loss=0.018743, gravity_loss=0.000334
iteration 120: train_error_observed=0.857936, test_error_observed=0.878784, observed_loss=0.857936, regularization_loss=0.020579, gravity_loss=0.000501
iteration 130: train_error_observed=0.816035, test_error_observed=0.838298, observed_loss=0.816035, regularization_loss=0.022820, gravity_loss=0.000774
iteration 140: train_error_observed=0.774204, test_error_observed=0.797634, observed_loss=0.774204, regularization_loss=0.025316, gravity_loss=0.001170
iteration 150: train_error_observed=0.733110, test_error_observed=0.757526, observed_loss=0.733110, regularization_loss=0.027972, gravity_loss=0.001697
iteration 160: train_error_observed=0.693308, test_error_observed=0.718463, observed_loss=0.693308, regularization_loss=0.030736, gravity_loss=0.002367
iteration 170: train_error_observed=0.655570, test_error_observed=0.681216, observed_loss=0.655570, regularization_loss=0.033558, gravity_loss=0.003184
iteration 180: train_error_observed=0.620484, test_error_observed=0.646454, observed_loss=0.620484, regularization_loss=0.036381, gravity_loss=0.004142
iteration 190: train_error_observed=0.588288, test_error_observed=0.614522, observed_loss=0.588288, regularization_loss=0.039155, gravity_loss=0.005227
iteration 200: train_error_observed=0.558957, test_error_observed=0.585471, observed_loss=0.558957, regularization_loss=0.041842, gravity_loss=0.006420
iteration 210: train_error_observed=0.532322, test_error_observed=0.559171, observed_loss=0.532322, regularization_loss=0.044417, gravity_loss=0.007703
iteration 220: train_error_observed=0.508147, test_error_observed=0.535407, observed_loss=0.508147, regularization_loss=0.046867, gravity_loss=0.009057
iteration 230: train_error_observed=0.486184, test_error_observed=0.513930, observed_loss=0.486184, regularization_loss=0.049187, gravity_loss=0.010466
iteration 240: train_error_observed=0.466191, test_error_observed=0.494496, observed_loss=0.466191, regularization_loss=0.051377, gravity_loss=0.011915
iteration 250: train_error_observed=0.447949, test_error_observed=0.476880, observed_loss=0.447949, regularization_loss=0.053439, gravity_loss=0.013392
iteration 260: train_error_observed=0.431263, test_error_observed=0.460881, observed_loss=0.431263, regularization_loss=0.055378, gravity_loss=0.014885
iteration 270: train_error_observed=0.415961, test_error_observed=0.446321, observed_loss=0.415961, regularization_loss=0.057200, gravity_loss=0.016386
iteration 280: train_error_observed=0.401893, test_error_observed=0.433043, observed_loss=0.401893, regularization_loss=0.058910, gravity_loss=0.017886
iteration 290: train_error_observed=0.388928, test_error_observed=0.420911, observed_loss=0.388928, regularization_loss=0.060515, gravity_loss=0.019379
iteration 300: train_error_observed=0.376949, test_error_observed=0.409802, observed_loss=0.376949, regularization_loss=0.062021, gravity_loss=0.020858
iteration 310: train_error_observed=0.365857, test_error_observed=0.399610, observed_loss=0.365857, regularization_loss=0.063433, gravity_loss=0.022318
iteration 320: train_error_observed=0.355564, test_error_observed=0.390240, observed_loss=0.355564, regularization_loss=0.064759, gravity_loss=0.023756
iteration 330: train_error_observed=0.345990, test_error_observed=0.381608, observed_loss=0.345990, regularization_loss=0.066002, gravity_loss=0.025167
iteration 340: train_error_observed=0.337068, test_error_observed=0.373640, observed_loss=0.337068, regularization_loss=0.067169, gravity_loss=0.026549
iteration 350: train_error_observed=0.328738, test_error_observed=0.366272, observed_loss=0.328738, regularization_loss=0.068264, gravity_loss=0.027900
iteration 360: train_error_observed=0.320945, test_error_observed=0.359446, observed_loss=0.320945, regularization_loss=0.069292, gravity_loss=0.029217
iteration 370: train_error_observed=0.313642, test_error_observed=0.353111, observed_loss=0.313642, regularization_loss=0.070257, gravity_loss=0.030498
iteration 380: train_error_observed=0.306788, test_error_observed=0.347222, observed_loss=0.306788, regularization_loss=0.071165, gravity_loss=0.031744
iteration 390: train_error_observed=0.300343, test_error_observed=0.341739, observed_loss=0.300343, regularization_loss=0.072018, gravity_loss=0.032952
iteration 400: train_error_observed=0.294275, test_error_observed=0.336627, observed_loss=0.294275, regularization_loss=0.072821, gravity_loss=0.034123
iteration 410: train_error_observed=0.288553, test_error_observed=0.331853, observed_loss=0.288553, regularization_loss=0.073577, gravity_loss=0.035256
iteration 420: train_error_observed=0.283150, test_error_observed=0.327390, observed_loss=0.283150, regularization_loss=0.074288, gravity_loss=0.036350
iteration 430: train_error_observed=0.278040, test_error_observed=0.323212, observed_loss=0.278040, regularization_loss=0.074960, gravity_loss=0.037407
iteration 440: train_error_observed=0.273203, test_error_observed=0.319297, observed_loss=0.273203, regularization_loss=0.075593, gravity_loss=0.038425
iteration 450: train_error_observed=0.268616, test_error_observed=0.315622, observed_loss=0.268616, regularization_loss=0.076191, gravity_loss=0.039406
iteration 460: train_error_observed=0.264261, test_error_observed=0.312171, observed_loss=0.264261, regularization_loss=0.076756, gravity_loss=0.040350
iteration 470: train_error_observed=0.260123, test_error_observed=0.308926, observed_loss=0.260123, regularization_loss=0.077292, gravity_loss=0.041257
iteration 480: train_error_observed=0.256184, test_error_observed=0.305871, observed_loss=0.256184, regularization_loss=0.077799, gravity_loss=0.042128
iteration 490: train_error_observed=0.252430, test_error_observed=0.302993, observed_loss=0.252430, regularization_loss=0.078281, gravity_loss=0.042964
iteration 500: train_error_observed=0.248849, test_error_observed=0.300278, observed_loss=0.248849, regularization_loss=0.078738, gravity_loss=0.043765
iteration 510: train_error_observed=0.245429, test_error_observed=0.297715, observed_loss=0.245429, regularization_loss=0.079174, gravity_loss=0.044532
iteration 520: train_error_observed=0.242157, test_error_observed=0.295293, observed_loss=0.242157, regularization_loss=0.079589, gravity_loss=0.045266
iteration 530: train_error_observed=0.239025, test_error_observed=0.293002, observed_loss=0.239025, regularization_loss=0.079986, gravity_loss=0.045969
iteration 540: train_error_observed=0.236021, test_error_observed=0.290832, observed_loss=0.236021, regularization_loss=0.080366, gravity_loss=0.046639
iteration 550: train_error_observed=0.233138, test_error_observed=0.288777, observed_loss=0.233138, regularization_loss=0.080730, gravity_loss=0.047280
iteration 560: train_error_observed=0.230368, test_error_observed=0.286827, observed_loss=0.230368, regularization_loss=0.081080, gravity_loss=0.047891
iteration 570: train_error_observed=0.227703, test_error_observed=0.284976, observed_loss=0.227703, regularization_loss=0.081417, gravity_loss=0.048474
iteration 580: train_error_observed=0.225135, test_error_observed=0.283217, observed_loss=0.225135, regularization_loss=0.081742, gravity_loss=0.049029
iteration 590: train_error_observed=0.222659, test_error_observed=0.281544, observed_loss=0.222659, regularization_loss=0.082057, gravity_loss=0.049557
iteration 600: train_error_observed=0.220269, test_error_observed=0.279952, observed_loss=0.220269, regularization_loss=0.082362, gravity_loss=0.050060
iteration 610: train_error_observed=0.217959, test_error_observed=0.278434, observed_loss=0.217959, regularization_loss=0.082659, gravity_loss=0.050537
iteration 620: train_error_observed=0.215723, test_error_observed=0.276988, observed_loss=0.215723, regularization_loss=0.082948, gravity_loss=0.050991
iteration 630: train_error_observed=0.213558, test_error_observed=0.275607, observed_loss=0.213558, regularization_loss=0.083230, gravity_loss=0.051421
iteration 640: train_error_observed=0.211459, test_error_observed=0.274289, observed_loss=0.211459, regularization_loss=0.083506, gravity_loss=0.051829
iteration 650: train_error_observed=0.209421, test_error_observed=0.273028, observed_loss=0.209421, regularization_loss=0.083777, gravity_loss=0.052216
iteration 660: train_error_observed=0.207441, test_error_observed=0.271822, observed_loss=0.207441, regularization_loss=0.084043, gravity_loss=0.052582
iteration 670: train_error_observed=0.205515, test_error_observed=0.270668, observed_loss=0.205515, regularization_loss=0.084305, gravity_loss=0.052928
iteration 680: train_error_observed=0.203640, test_error_observed=0.269562, observed_loss=0.203640, regularization_loss=0.084564, gravity_loss=0.053254
iteration 690: train_error_observed=0.201813, test_error_observed=0.268501, observed_loss=0.201813, regularization_loss=0.084819, gravity_loss=0.053562
iteration 700: train_error_observed=0.200031, test_error_observed=0.267483, observed_loss=0.200031, regularization_loss=0.085073, gravity_loss=0.053852
iteration 710: train_error_observed=0.198292, test_error_observed=0.266505, observed_loss=0.198292, regularization_loss=0.085324, gravity_loss=0.054125
iteration 720: train_error_observed=0.196593, test_error_observed=0.265566, observed_loss=0.196593, regularization_loss=0.085574, gravity_loss=0.054382
iteration 730: train_error_observed=0.194931, test_error_observed=0.264662, observed_loss=0.194931, regularization_loss=0.085823, gravity_loss=0.054623
iteration 740: train_error_observed=0.193305, test_error_observed=0.263793, observed_loss=0.193305, regularization_loss=0.086071, gravity_loss=0.054848
iteration 750: train_error_observed=0.191713, test_error_observed=0.262955, observed_loss=0.191713, regularization_loss=0.086318, gravity_loss=0.055059
iteration 760: train_error_observed=0.190152, test_error_observed=0.262148, observed_loss=0.190152, regularization_loss=0.086565, gravity_loss=0.055256
iteration 770: train_error_observed=0.188622, test_error_observed=0.261370, observed_loss=0.188622, regularization_loss=0.086813, gravity_loss=0.055439
iteration 780: train_error_observed=0.187120, test_error_observed=0.260619, observed_loss=0.187120, regularization_loss=0.087060, gravity_loss=0.055609
iteration 790: train_error_observed=0.185645, test_error_observed=0.259895, observed_loss=0.185645, regularization_loss=0.087308, gravity_loss=0.055766
iteration 800: train_error_observed=0.184196, test_error_observed=0.259194, observed_loss=0.184196, regularization_loss=0.087556, gravity_loss=0.055912
iteration 810: train_error_observed=0.182772, test_error_observed=0.258517, observed_loss=0.182772, regularization_loss=0.087806, gravity_loss=0.056046
iteration 820: train_error_observed=0.181370, test_error_observed=0.257863, observed_loss=0.181370, regularization_loss=0.088056, gravity_loss=0.056169
iteration 830: train_error_observed=0.179991, test_error_observed=0.257229, observed_loss=0.179991, regularization_loss=0.088307, gravity_loss=0.056281
iteration 840: train_error_observed=0.178633, test_error_observed=0.256616, observed_loss=0.178633, regularization_loss=0.088560, gravity_loss=0.056383
iteration 850: train_error_observed=0.177295, test_error_observed=0.256021, observed_loss=0.177295, regularization_loss=0.088814, gravity_loss=0.056475
iteration 860: train_error_observed=0.175977, test_error_observed=0.255446, observed_loss=0.175977, regularization_loss=0.089069, gravity_loss=0.056558
iteration 870: train_error_observed=0.174677, test_error_observed=0.254887, observed_loss=0.174677, regularization_loss=0.089326, gravity_loss=0.056632
iteration 880: train_error_observed=0.173395, test_error_observed=0.254345, observed_loss=0.173395, regularization_loss=0.089583, gravity_loss=0.056697
iteration 890: train_error_observed=0.172130, test_error_observed=0.253820, observed_loss=0.172130, regularization_loss=0.089843, gravity_loss=0.056754
iteration 900: train_error_observed=0.170882, test_error_observed=0.253309, observed_loss=0.170882, regularization_loss=0.090104, gravity_loss=0.056802
iteration 910: train_error_observed=0.169650, test_error_observed=0.252813, observed_loss=0.169650, regularization_loss=0.090366, gravity_loss=0.056843
iteration 920: train_error_observed=0.168433, test_error_observed=0.252331, observed_loss=0.168433, regularization_loss=0.090630, gravity_loss=0.056877
iteration 930: train_error_observed=0.167231, test_error_observed=0.251863, observed_loss=0.167231, regularization_loss=0.090895, gravity_loss=0.056904
iteration 940: train_error_observed=0.166044, test_error_observed=0.251408, observed_loss=0.166044, regularization_loss=0.091162, gravity_loss=0.056924
iteration 950: train_error_observed=0.164870, test_error_observed=0.250964, observed_loss=0.164870, regularization_loss=0.091430, gravity_loss=0.056937
iteration 960: train_error_observed=0.163711, test_error_observed=0.250533, observed_loss=0.163711, regularization_loss=0.091700, gravity_loss=0.056944
iteration 970: train_error_observed=0.162564, test_error_observed=0.250113, observed_loss=0.162564, regularization_loss=0.091971, gravity_loss=0.056946
iteration 980: train_error_observed=0.161431, test_error_observed=0.249705, observed_loss=0.161431, regularization_loss=0.092243, gravity_loss=0.056941
iteration 990: train_error_observed=0.160310, test_error_observed=0.249306, observed_loss=0.160310, regularization_loss=0.092516, gravity_loss=0.056932
iteration 1000: train_error_observed=0.159202, test_error_observed=0.248918, observed_loss=0.159202, regularization_loss=0.092791, gravity_loss=0.056917
iteration 1010: train_error_observed=0.158106, test_error_observed=0.248540, observed_loss=0.158106, regularization_loss=0.093066, gravity_loss=0.056897
iteration 1020: train_error_observed=0.157022, test_error_observed=0.248172, observed_loss=0.157022, regularization_loss=0.093343, gravity_loss=0.056872
iteration 1030: train_error_observed=0.155950, test_error_observed=0.247812, observed_loss=0.155950, regularization_loss=0.093621, gravity_loss=0.056843
iteration 1040: train_error_observed=0.154889, test_error_observed=0.247462, observed_loss=0.154889, regularization_loss=0.093899, gravity_loss=0.056810
iteration 1050: train_error_observed=0.153840, test_error_observed=0.247120, observed_loss=0.153840, regularization_loss=0.094179, gravity_loss=0.056772
iteration 1060: train_error_observed=0.152802, test_error_observed=0.246786, observed_loss=0.152802, regularization_loss=0.094459, gravity_loss=0.056731
iteration 1070: train_error_observed=0.151775, test_error_observed=0.246460, observed_loss=0.151775, regularization_loss=0.094739, gravity_loss=0.056686
iteration 1080: train_error_observed=0.150759, test_error_observed=0.246142, observed_loss=0.150759, regularization_loss=0.095020, gravity_loss=0.056637
iteration 1090: train_error_observed=0.149754, test_error_observed=0.245832, observed_loss=0.149754, regularization_loss=0.095302, gravity_loss=0.056585
iteration 1100: train_error_observed=0.148760, test_error_observed=0.245529, observed_loss=0.148760, regularization_loss=0.095584, gravity_loss=0.056530
iteration 1110: train_error_observed=0.147776, test_error_observed=0.245233, observed_loss=0.147776, regularization_loss=0.095866, gravity_loss=0.056471
iteration 1120: train_error_observed=0.146803, test_error_observed=0.244944, observed_loss=0.146803, regularization_loss=0.096148, gravity_loss=0.056410
iteration 1130: train_error_observed=0.145840, test_error_observed=0.244661, observed_loss=0.145840, regularization_loss=0.096431, gravity_loss=0.056346
iteration 1140: train_error_observed=0.144887, test_error_observed=0.244385, observed_loss=0.144887, regularization_loss=0.096713, gravity_loss=0.056279
iteration 1150: train_error_observed=0.143945, test_error_observed=0.244115, observed_loss=0.143945, regularization_loss=0.096995, gravity_loss=0.056210
iteration 1160: train_error_observed=0.143013, test_error_observed=0.243851, observed_loss=0.143013, regularization_loss=0.097278, gravity_loss=0.056139
iteration 1170: train_error_observed=0.142091, test_error_observed=0.243594, observed_loss=0.142091, regularization_loss=0.097559, gravity_loss=0.056065
iteration 1180: train_error_observed=0.141179, test_error_observed=0.243342, observed_loss=0.141179, regularization_loss=0.097841, gravity_loss=0.055989
iteration 1190: train_error_observed=0.140277, test_error_observed=0.243095, observed_loss=0.140277, regularization_loss=0.098122, gravity_loss=0.055911
iteration 1200: train_error_observed=0.139385, test_error_observed=0.242854, observed_loss=0.139385, regularization_loss=0.098403, gravity_loss=0.055832
iteration 1210: train_error_observed=0.138502, test_error_observed=0.242619, observed_loss=0.138502, regularization_loss=0.098682, gravity_loss=0.055751
iteration 1220: train_error_observed=0.137629, test_error_observed=0.242388, observed_loss=0.137629, regularization_loss=0.098962, gravity_loss=0.055668
iteration 1230: train_error_observed=0.136766, test_error_observed=0.242163, observed_loss=0.136766, regularization_loss=0.099240, gravity_loss=0.055583
iteration 1240: train_error_observed=0.135912, test_error_observed=0.241942, observed_loss=0.135912, regularization_loss=0.099518, gravity_loss=0.055497
iteration 1250: train_error_observed=0.135068, test_error_observed=0.241727, observed_loss=0.135068, regularization_loss=0.099795, gravity_loss=0.055410
iteration 1260: train_error_observed=0.134233, test_error_observed=0.241515, observed_loss=0.134233, regularization_loss=0.100070, gravity_loss=0.055321
iteration 1270: train_error_observed=0.133408, test_error_observed=0.241309, observed_loss=0.133408, regularization_loss=0.100345, gravity_loss=0.055232
iteration 1280: train_error_observed=0.132591, test_error_observed=0.241107, observed_loss=0.132591, regularization_loss=0.100619, gravity_loss=0.055141
iteration 1290: train_error_observed=0.131784, test_error_observed=0.240909, observed_loss=0.131784, regularization_loss=0.100891, gravity_loss=0.055049
iteration 1300: train_error_observed=0.130986, test_error_observed=0.240716, observed_loss=0.130986, regularization_loss=0.101163, gravity_loss=0.054956
iteration 1310: train_error_observed=0.130197, test_error_observed=0.240526, observed_loss=0.130197, regularization_loss=0.101433, gravity_loss=0.054863
iteration 1320: train_error_observed=0.129417, test_error_observed=0.240341, observed_loss=0.129417, regularization_loss=0.101701, gravity_loss=0.054768
iteration 1330: train_error_observed=0.128646, test_error_observed=0.240159, observed_loss=0.128646, regularization_loss=0.101969, gravity_loss=0.054673
iteration 1340: train_error_observed=0.127883, test_error_observed=0.239982, observed_loss=0.127883, regularization_loss=0.102234, gravity_loss=0.054577
iteration 1350: train_error_observed=0.127129, test_error_observed=0.239808, observed_loss=0.127129, regularization_loss=0.102499, gravity_loss=0.054481
iteration 1360: train_error_observed=0.126384, test_error_observed=0.239638, observed_loss=0.126384, regularization_loss=0.102762, gravity_loss=0.054384
iteration 1370: train_error_observed=0.125647, test_error_observed=0.239471, observed_loss=0.125647, regularization_loss=0.103023, gravity_loss=0.054287
iteration 1380: train_error_observed=0.124919, test_error_observed=0.239308, observed_loss=0.124919, regularization_loss=0.103283, gravity_loss=0.054189
iteration 1390: train_error_observed=0.124199, test_error_observed=0.239148, observed_loss=0.124199, regularization_loss=0.103541, gravity_loss=0.054091
iteration 1400: train_error_observed=0.123488, test_error_observed=0.238992, observed_loss=0.123488, regularization_loss=0.103798, gravity_loss=0.053992
iteration 1410: train_error_observed=0.122784, test_error_observed=0.238838, observed_loss=0.122784, regularization_loss=0.104052, gravity_loss=0.053894
iteration 1420: train_error_observed=0.122089, test_error_observed=0.238688, observed_loss=0.122089, regularization_loss=0.104306, gravity_loss=0.053795
iteration 1430: train_error_observed=0.121401, test_error_observed=0.238542, observed_loss=0.121401, regularization_loss=0.104557, gravity_loss=0.053695
iteration 1440: train_error_observed=0.120722, test_error_observed=0.238398, observed_loss=0.120722, regularization_loss=0.104807, gravity_loss=0.053596
iteration 1450: train_error_observed=0.120050, test_error_observed=0.238257, observed_loss=0.120050, regularization_loss=0.105055, gravity_loss=0.053497
iteration 1460: train_error_observed=0.119386, test_error_observed=0.238119, observed_loss=0.119386, regularization_loss=0.105301, gravity_loss=0.053397
iteration 1470: train_error_observed=0.118730, test_error_observed=0.237984, observed_loss=0.118730, regularization_loss=0.105545, gravity_loss=0.053298
iteration 1480: train_error_observed=0.118081, test_error_observed=0.237851, observed_loss=0.118081, regularization_loss=0.105787, gravity_loss=0.053198
iteration 1490: train_error_observed=0.117440, test_error_observed=0.237722, observed_loss=0.117440, regularization_loss=0.106028, gravity_loss=0.053099
iteration 1500: train_error_observed=0.116806, test_error_observed=0.237595, observed_loss=0.116806, regularization_loss=0.106267, gravity_loss=0.052999
iteration 1510: train_error_observed=0.116179, test_error_observed=0.237470, observed_loss=0.116179, regularization_loss=0.106504, gravity_loss=0.052900
iteration 1520: train_error_observed=0.115560, test_error_observed=0.237348, observed_loss=0.115560, regularization_loss=0.106739, gravity_loss=0.052801
iteration 1530: train_error_observed=0.114948, test_error_observed=0.237229, observed_loss=0.114948, regularization_loss=0.106972, gravity_loss=0.052702
iteration 1540: train_error_observed=0.114343, test_error_observed=0.237112, observed_loss=0.114343, regularization_loss=0.107204, gravity_loss=0.052603
iteration 1550: train_error_observed=0.113744, test_error_observed=0.236997, observed_loss=0.113744, regularization_loss=0.107433, gravity_loss=0.052504
iteration 1560: train_error_observed=0.113153, test_error_observed=0.236885, observed_loss=0.113153, regularization_loss=0.107661, gravity_loss=0.052406
iteration 1570: train_error_observed=0.112568, test_error_observed=0.236775, observed_loss=0.112568, regularization_loss=0.107887, gravity_loss=0.052308
iteration 1580: train_error_observed=0.111990, test_error_observed=0.236667, observed_loss=0.111990, regularization_loss=0.108111, gravity_loss=0.052210
iteration 1590: train_error_observed=0.111419, test_error_observed=0.236562, observed_loss=0.111419, regularization_loss=0.108333, gravity_loss=0.052112
iteration 1600: train_error_observed=0.110854, test_error_observed=0.236458, observed_loss=0.110854, regularization_loss=0.108553, gravity_loss=0.052015
iteration 1610: train_error_observed=0.110296, test_error_observed=0.236357, observed_loss=0.110296, regularization_loss=0.108771, gravity_loss=0.051918
iteration 1620: train_error_observed=0.109744, test_error_observed=0.236258, observed_loss=0.109744, regularization_loss=0.108988, gravity_loss=0.051821
iteration 1630: train_error_observed=0.109198, test_error_observed=0.236160, observed_loss=0.109198, regularization_loss=0.109202, gravity_loss=0.051725
iteration 1640: train_error_observed=0.108659, test_error_observed=0.236065, observed_loss=0.108659, regularization_loss=0.109415, gravity_loss=0.051629
iteration 1650: train_error_observed=0.108125, test_error_observed=0.235972, observed_loss=0.108125, regularization_loss=0.109626, gravity_loss=0.051534
iteration 1660: train_error_observed=0.107598, test_error_observed=0.235880, observed_loss=0.107598, regularization_loss=0.109835, gravity_loss=0.051439
iteration 1670: train_error_observed=0.107077, test_error_observed=0.235790, observed_loss=0.107077, regularization_loss=0.110043, gravity_loss=0.051344
iteration 1680: train_error_observed=0.106561, test_error_observed=0.235703, observed_loss=0.106561, regularization_loss=0.110248, gravity_loss=0.051249
iteration 1690: train_error_observed=0.106051, test_error_observed=0.235617, observed_loss=0.106051, regularization_loss=0.110452, gravity_loss=0.051156
iteration 1700: train_error_observed=0.105547, test_error_observed=0.235532, observed_loss=0.105547, regularization_loss=0.110654, gravity_loss=0.051062
iteration 1710: train_error_observed=0.105049, test_error_observed=0.235449, observed_loss=0.105049, regularization_loss=0.110854, gravity_loss=0.050969
iteration 1720: train_error_observed=0.104556, test_error_observed=0.235369, observed_loss=0.104556, regularization_loss=0.111053, gravity_loss=0.050876
iteration 1730: train_error_observed=0.104069, test_error_observed=0.235289, observed_loss=0.104069, regularization_loss=0.111249, gravity_loss=0.050784
iteration 1740: train_error_observed=0.103587, test_error_observed=0.235211, observed_loss=0.103587, regularization_loss=0.111444, gravity_loss=0.050693
iteration 1750: train_error_observed=0.103110, test_error_observed=0.235135, observed_loss=0.103110, regularization_loss=0.111638, gravity_loss=0.050601
iteration 1760: train_error_observed=0.102639, test_error_observed=0.235061, observed_loss=0.102639, regularization_loss=0.111829, gravity_loss=0.050511
iteration 1770: train_error_observed=0.102173, test_error_observed=0.234987, observed_loss=0.102173, regularization_loss=0.112019, gravity_loss=0.050421
iteration 1780: train_error_observed=0.101712, test_error_observed=0.234916, observed_loss=0.101712, regularization_loss=0.112207, gravity_loss=0.050331
iteration 1790: train_error_observed=0.101257, test_error_observed=0.234846, observed_loss=0.101257, regularization_loss=0.112394, gravity_loss=0.050242
iteration 1800: train_error_observed=0.100806, test_error_observed=0.234777, observed_loss=0.100806, regularization_loss=0.112579, gravity_loss=0.050153
iteration 1810: train_error_observed=0.100360, test_error_observed=0.234709, observed_loss=0.100360, regularization_loss=0.112762, gravity_loss=0.050065
iteration 1820: train_error_observed=0.099919, test_error_observed=0.234643, observed_loss=0.099919, regularization_loss=0.112944, gravity_loss=0.049977
iteration 1830: train_error_observed=0.099483, test_error_observed=0.234579, observed_loss=0.099483, regularization_loss=0.113124, gravity_loss=0.049890
iteration 1840: train_error_observed=0.099051, test_error_observed=0.234515, observed_loss=0.099051, regularization_loss=0.113302, gravity_loss=0.049803
iteration 1850: train_error_observed=0.098625, test_error_observed=0.234453, observed_loss=0.098625, regularization_loss=0.113479, gravity_loss=0.049717
iteration 1860: train_error_observed=0.098202, test_error_observed=0.234392, observed_loss=0.098202, regularization_loss=0.113655, gravity_loss=0.049631
iteration 1870: train_error_observed=0.097785, test_error_observed=0.234333, observed_loss=0.097785, regularization_loss=0.113829, gravity_loss=0.049546
iteration 1880: train_error_observed=0.097372, test_error_observed=0.234274, observed_loss=0.097372, regularization_loss=0.114001, gravity_loss=0.049461
iteration 1890: train_error_observed=0.096963, test_error_observed=0.234217, observed_loss=0.096963, regularization_loss=0.114172, gravity_loss=0.049377
iteration 1900: train_error_observed=0.096559, test_error_observed=0.234161, observed_loss=0.096559, regularization_loss=0.114341, gravity_loss=0.049293
iteration 1910: train_error_observed=0.096159, test_error_observed=0.234107, observed_loss=0.096159, regularization_loss=0.114509, gravity_loss=0.049210
iteration 1920: train_error_observed=0.095763, test_error_observed=0.234053, observed_loss=0.095763, regularization_loss=0.114675, gravity_loss=0.049128
iteration 1930: train_error_observed=0.095372, test_error_observed=0.234000, observed_loss=0.095372, regularization_loss=0.114840, gravity_loss=0.049046
iteration 1940: train_error_observed=0.094985, test_error_observed=0.233949, observed_loss=0.094985, regularization_loss=0.115003, gravity_loss=0.048964
iteration 1950: train_error_observed=0.094601, test_error_observed=0.233898, observed_loss=0.094601, regularization_loss=0.115165, gravity_loss=0.048883
iteration 1960: train_error_observed=0.094222, test_error_observed=0.233849, observed_loss=0.094222, regularization_loss=0.115326, gravity_loss=0.048803
iteration 1970: train_error_observed=0.093847, test_error_observed=0.233801, observed_loss=0.093847, regularization_loss=0.115485, gravity_loss=0.048723
iteration 1980: train_error_observed=0.093476, test_error_observed=0.233754, observed_loss=0.093476, regularization_loss=0.115643, gravity_loss=0.048643
iteration 1990: train_error_observed=0.093108, test_error_observed=0.233707, observed_loss=0.093108, regularization_loss=0.115800, gravity_loss=0.048564
iteration 2000: train_error_observed=0.092745, test_error_observed=0.233662, observed_loss=0.092745, regularization_loss=0.115955, gravity_loss=0.048486
[{'train_error_observed': 0.09274472, 'test_error_observed': 0.23366201},
{'observed_loss': 0.09274472,
'regularization_loss': 0.11595495,
'gravity_loss': 0.04848592}]
In both models, we observe a steep loss in train error and test as the model progress. Although, the regularized model has a higher MSE, both on the training and test set. It must be noted that the quality of recommendation is improved when regularization is added, which is proven when the artist_neighbors() function is utilized. In addition, we observe in the end evaluation section, that the the performance of the model is improved when regularization is added. The test error decreases similarity to the test error, although it plateaus around the 1000 epoch mark. As expected, the the additional loss generated by the regularization functions increases over epochs. We add the following regularisation terms to our model.
Regularization of the model parameters. This is a common \(\ell_2\) regularization term on the embedding matrices, given by \(r(U, V) = \frac{1}{N} \sum_i \|U_i\|^2 + \frac{1}{M}\sum_j \|V_j\|^2\).
A global prior that pushes the prediction of any pair towards zero, called the gravity term. This is given by \(g(U, V) = \frac{1}{MN} \sum_{i = 1}^N \sum_{j = 1}^M \langle U_i, V_j \rangle^2\)
These terms modifies the “global” loss (as in, the sum of the network loss and the regularization loss) in order to drive the optimization algorithm in desired directions i.e. prevent overfitting.
Evaluating the embeddings¶
We will use two similairty meausres to inspect the robustness of our system:
Dot product: score of artist j \(\langle u, V_j \rangle\).
Cosine angle: score of artist j \(\frac{\langle u, V_j \rangle}{\|u\|\|V_j\|}\).
DOT = 'dot'
COSINE = 'cosine'
def compute_scores(query_embedding, item_embeddings, measure=DOT):
"""Computes the scores of the candidates given a query.
Args:
query_embedding: a vector of shape [k], representing the query embedding.
item_embeddings: a matrix of shape [N, k], such that row i is the embedding
of item i.
measure: a string specifying the similarity measure to be used. Can be
either DOT or COSINE.
Returns:
scores: a vector of shape [N], such that scores[i] is the score of item i.
"""
u = query_embedding
V = item_embeddings
if measure == COSINE:
V = V / np.linalg.norm(V, axis=1, keepdims=True)
u = u / np.linalg.norm(u)
scores = u.dot(V.T)
return scores
def user_recommendations(model,user_id, k=15, measure=DOT, exclude_rated=False):
scores = compute_scores(
model.embeddings["userID"][user_id], model.embeddings["artistID"], measure)
score_key = measure + ' score'
df = pd.DataFrame({
'score': list(scores),
'name': artists.sort_values('artistID', ascending=True)['name'],
'most assigned tag':artists.sort_values('artistID', ascending=True)['mostCommonGenre']
})
return df.sort_values(['score'], ascending=False).head(k)
def artist_neighbors(model, title_substring, measure=DOT, k=6):
# Search for artist ids that match the given substring.
inv_artist_id_mapping = {v: k for k, v in orginal_artist_ids.items()}
ids = artists[artists['name'].str.contains(title_substring)].artistID.values
titles = artists[artists.artistID.isin(ids)]['name'].values
if len(titles) == 0:
raise ValueError("Found no artists with name %s" % title_substring)
print("Nearest neighbors of : %s." % titles[0])
if len(titles) > 1:
print("[Found more than one matching artist. Other candidates: {}]".format(
", ".join(titles[1:])))
artists_id_orginal = ids[0]
asrtists_id_mapped = inv_artist_id_mapping[ids[0]]
scores = compute_scores(
model.embeddings["artistID"][asrtists_id_mapped], model.embeddings["artistID"],
measure)
score_key = measure + ' score'
df = pd.DataFrame({
score_key: list(scores),
'name': artists.sort_values('artistID', ascending=True)['name'],
'most assigned tag':artists.sort_values('artistID', ascending=True)['mostCommonGenre']
})
return df.sort_values([score_key], ascending=False).head(k)
Here, we find the most similar artists to the band the cure. We also include the most assigned tag associated with an artist. The reccomdations are conistent with our domain knowedge of bands similar to the cure.
artist_neighbors(vanilla_model, "The Cure", DOT)
Nearest neighbors of : The Cure.
| dot score | name | most assigned tag | |
|---|---|---|---|
| 9437 | 0.529 | The Cure | chillout |
| 17278 | 0.527 | Kings of Leon | chillout |
| 58990 | 0.522 | Queen | 80's |
| 3259 | 0.522 | Coldplay | chillout |
| 26579 | 0.522 | Alanis Morissette | chillout |
| 28075 | 0.521 | Kanye West | electronic |
artist_neighbors(vanilla_model, "The Cure", COSINE)
Nearest neighbors of : The Cure.
| cosine score | name | most assigned tag | |
|---|---|---|---|
| 9437 | 1.000 | The Cure | chillout |
| 10850 | 0.981 | Placebo | chillout |
| 4936 | 0.977 | Depeche Mode | chillout |
| 16680 | 0.976 | The Beatles | chillout |
| 8273 | 0.975 | Radiohead | chillout |
| 43413 | 0.970 | David Bowie | chillout |
artist_neighbors(reg_model, "The Cure", DOT)
Nearest neighbors of : The Cure.
| dot score | name | most assigned tag | |
|---|---|---|---|
| 16680 | 3.297 | The Beatles | chillout |
| 12363 | 3.217 | Muse | chillout |
| 9437 | 3.213 | The Cure | chillout |
| 18364 | 3.206 | Nirvana | pop |
| 3259 | 3.186 | Coldplay | chillout |
| 8273 | 3.177 | Radiohead | chillout |
artist_neighbors(reg_model, "The Cure", COSINE)
Nearest neighbors of : The Cure.
| cosine score | name | most assigned tag | |
|---|---|---|---|
| 9437 | 1.000 | The Cure | chillout |
| 115196 | 0.970 | Hole | female vocalist |
| 8273 | 0.970 | Radiohead | chillout |
| 16680 | 0.964 | The Beatles | chillout |
| 43413 | 0.964 | David Bowie | chillout |
| 48889 | 0.959 | R.E.M. | atmospheric |
We observe that dot product tends to recommends more popular artists such as Nirvana and The Beatles, where as Cosine Similarity recommends more obscure artists. This is likely due to the fact that the norm of the embedding in matrix factorization is often correlated with prevalence. The regularised model seems to output better reccomodations as the varation of the most assigned tag attribute is less when compared to the vanilla model. In addition, Marilyn Manson was recommended by the vanilla model in our intial run. We argue that these artists are most dis-similar! However, this observation is subject to change when you run the model, as we initialize the embedddings with a random gaussian generator.
def artist_embedding_norm(models):
"""Visualizes the norm and number of ratings of the artist embeddings.
Args:
model: A train_matrix_norm object.
"""
if not isinstance(models, list):
models = [models]
df = pd.DataFrame({
'name': artists.sort_values('artistID', ascending=True)['name'].values,
'number of user-artist interactions': user_artists[['artistID','userID']].sort_values('artistID', ascending=True).groupby('artistID').count()['userID'].values,
})
charts = []
brush = alt.selection_interval()
for i, model in enumerate(models):
norm_key = 'norm'+str(i)
df[norm_key] = np.linalg.norm(model.embeddings["artistID"], axis=1)
nearest = alt.selection(
type='single', encodings=['x', 'y'], on='mouseover', nearest=True,
empty='none')
base = alt.Chart().mark_circle().encode(
x='number of user-artist interactions',
y=norm_key,
color=alt.condition(brush, alt.value('#4c78a8'), alt.value('lightgray'))
).properties(
selection=nearest).add_selection(brush)
text = alt.Chart().mark_text(align='center', dx=5, dy=-5).encode(
x='number of user-artist interactions', y=norm_key,
text=alt.condition(nearest, 'name', alt.value('')))
charts.append(alt.layer(base, text))
return alt.hconcat(*charts, data=df)
artist_embedding_norm(reg_model)
def visualize_movie_embeddings(data, x, y):
genre_filter = alt.selection_multi(fields=['top10TagValue'])
genre_chart = alt.Chart().mark_bar().encode(
x="count()",
y=alt.Y('top10TagValue'),
color=alt.condition(
genre_filter,
alt.Color("top10TagValue:N"),
alt.value('lightgray'))
).properties(height=300, selection=genre_filter)
nearest = alt.selection(
type='single', encodings=['x', 'y'], on='mouseover', nearest=True,
empty='none')
base = alt.Chart().mark_circle().encode(
x=x,
y=y,
color=alt.condition(genre_filter, "top10TagValue", alt.value("whitesmoke")),
).properties(
width=600,
height=600,
selection=nearest)
text = alt.Chart().mark_text(align='left', dx=5, dy=-5).encode(
x=x,
y=y,
text=alt.condition(nearest, 'name', alt.value('')))
return alt.hconcat(alt.layer(base, text), genre_chart, data=data)
def tsne_movie_embeddings(model):
"""Visualizes the movie embeddings, projected using t-SNE with Cosine measure.
Args:
model: A MFModel object.
"""
tsne = sklearn.manifold.TSNE(
n_components=2, perplexity=40, metric='cosine', early_exaggeration=10.0,
init='pca', verbose=True, n_iter=400)
print('Running t-SNE...')
V_proj = tsne.fit_transform(model.embeddings["artistID"])
artists.loc[:,'x'] = V_proj[:, 0]
artists.loc[:,'y'] = V_proj[:, 1]
return visualize_movie_embeddings(artists, 'x', 'y')
T-distributed stochastic neighbor embedding (t-SNE) is a dimensionality reduction algorithm useful for visualizing high dimensional data. We use this algorithim to visualise our embeddings of the regualrised model. Due to the large number of user submitted semantic categories, we decide to color-code the top 15 tags, with the rest being labelled as ‘N/A’. Although the sea of orange, indicating’N/A’, makes it difficult to interrupt these results, the regularised model seems to adequaltly cluster artists of a similar genre in it’s embeddings.
tsne_movie_embeddings(reg_model)
Running t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 17632 samples in 0.001s...
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
FutureWarning,
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:827: FutureWarning: 'square_distances' has been introduced in 0.24 to help phase out legacy squaring behavior. The 'legacy' setting will be removed in 1.1 (renaming of 0.26), and the default setting will be changed to True. In 1.3, 'square_distances' will be removed altogether, and distances will be squared by default. Set 'square_distances'=True to silence this warning.
FutureWarning,
[t-SNE] Computed neighbors for 17632 samples in 4.843s...
[t-SNE] Computed conditional probabilities for sample 1000 / 17632
[t-SNE] Computed conditional probabilities for sample 2000 / 17632
[t-SNE] Computed conditional probabilities for sample 3000 / 17632
[t-SNE] Computed conditional probabilities for sample 4000 / 17632
[t-SNE] Computed conditional probabilities for sample 5000 / 17632
[t-SNE] Computed conditional probabilities for sample 6000 / 17632
[t-SNE] Computed conditional probabilities for sample 7000 / 17632
[t-SNE] Computed conditional probabilities for sample 8000 / 17632
[t-SNE] Computed conditional probabilities for sample 9000 / 17632
[t-SNE] Computed conditional probabilities for sample 10000 / 17632
[t-SNE] Computed conditional probabilities for sample 11000 / 17632
[t-SNE] Computed conditional probabilities for sample 12000 / 17632
[t-SNE] Computed conditional probabilities for sample 13000 / 17632
[t-SNE] Computed conditional probabilities for sample 14000 / 17632
[t-SNE] Computed conditional probabilities for sample 15000 / 17632
[t-SNE] Computed conditional probabilities for sample 16000 / 17632
[t-SNE] Computed conditional probabilities for sample 17000 / 17632
[t-SNE] Computed conditional probabilities for sample 17632 / 17632
[t-SNE] Mean sigma: 0.178678
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:986: FutureWarning: The PCA initialization in TSNE will change to have the standard deviation of PC1 equal to 1e-4 in 1.2. This will ensure better convergence.
FutureWarning,
[t-SNE] KL divergence after 250 iterations with early exaggeration: 77.263321
[t-SNE] KL divergence after 400 iterations: 2.773607
def m_embedding_norm(models):
"""Visualizes the norm and number of ratings of the movie embeddings.
Args:
model: A MFModel object.
"""
if not isinstance(models, list):
models = [models]
df = pd.DataFrame({
'title': artists.sort_values('artistID', ascending=True)['name'].values,
'num_ratings': user_artists[['artistID','userID']].sort_values('artistID', ascending=True).groupby('artistID').count()['userID'].values,
})
charts = []
brush = alt.selection_interval()
for i, model in enumerate(models):
norm_key = 'norm'+str(i)
df[norm_key] = np.linalg.norm(model.embeddings["artistID"], axis=1)
nearest = alt.selection(
type='single', encodings=['x', 'y'], on='mouseover', nearest=True,
empty='none')
base = alt.Chart().mark_circle().encode(
x='num_ratings',
y=norm_key,
color=alt.condition(brush, alt.value('#4c78a8'), alt.value('lightgray'))
).properties(
selection=nearest).add_selection(brush)
text = alt.Chart().mark_text(align='center', dx=5, dy=-5).encode(
x='num_ratings', y=norm_key,
text=alt.condition(nearest, 'title', alt.value('')))
charts.append(alt.layer(base, text))
return alt.hconcat(*charts, data=df)
Demo¶
You can find the most similar artist to a specified artist (that is contained in Last.FM) using the artist_neighbours() function. Similarily, you can find the top 10 recommendations of a particular userID [0 to 1891] using the user_recommendations() function. The first argument specifies the desired model, second argument the userID and third the top-k recommendations. Fourth argument represents the similarity measure, either DOT or COSINE (default = DOT, not a string).
user_recommendations(reg_model, 234, 10, COSINE)
| score | name | most assigned tag | |
|---|---|---|---|
| 126582 | 0.939 | Validuaté | N/A |
| 126513 | 0.919 | Graforréia Xilarmônica | rock |
| 126400 | 0.903 | The Vibrators | punk |
| 121134 | 0.901 | 7Seconds | 80s |
| 126554 | 0.899 | Moreira da Silva | N/A |
| 126539 | 0.892 | Menstruação Anarquika | N/A |
| 126490 | 0.891 | Street Bulldogs | N/A |
| 126433 | 0.875 | Violator | thrash metal |
| 126491 | 0.856 | Bandas Gaúchas - www.DownsMtv.com | N/A |
| 126505 | 0.835 | Planet Hemp | rock |
To further demonstrate the robustness of the system and measure the serendipity of our model, we incorporate the top artists that we listen to on Spotify (i.e. an unknown user). Note, these artists have to also be in the Last.FM dataset. The recommendation system should output similar artists based on it’s artist embeddings. The Spotipy library is used to interact with Spotify’s API. The similarity measure used is the Dot product. Due to the short lived nature of the spotify token and the fact you have to sign into a pop-up to retrieve the authentication token, we simply list our top 5 artists manually. If we did not, jupyter book will stall when attempting to build as it is waiting for our response. However, we provide the code used to retrieve the short-lived token for verification purposes.
"""
import spotipy
from spotipy.oauth2 import SpotifyOAuth
client_id = <insert_your_client_id>
client_secret = <insert your client secret>
redirect_url = '<insert your redirect uri>
scope = "user-top-read user-read-playback-state streaming ugc-image-upload playlist-modify-public"
authenticate_manager = spotipy.oauth2.SpotifyOAuth(client_id = client_id,client_secret = client_secret,redirect_uri =redirect_url,scope =scope,show_dialog = True)
sp = spotipy.Spotify(auth_manager=authenticate_manager)
artists_long = sp.current_user_top_artists(limit=5, time_range="long_term")
"""
top_5_artists =[
'Coldplay',
'Paramore',
'Arctic Monkeys',
'Lily Allen',
'Miley Cyrus'
]
spotify_reccomdations_df = pd.DataFrame()
for artist in top_5_artists:
similar_artist_df = artist_neighbors(reg_model, artist)[['name','dot score']]
spotify_reccomdations_df = pd.concat([spotify_reccomdations_df, similar_artist_df])
spotify_reccomdations_df.sort_values('dot score', ascending=False).head(10)
Nearest neighbors of : Coldplay.
[Found more than one matching artist. Other candidates: Jay-Z & Coldplay, Coldplay/U2]
Nearest neighbors of : Paramore.
[Found more than one matching artist. Other candidates: Paramore攀]
Nearest neighbors of : Arctic Monkeys.
[Found more than one matching artist. Other candidates: Arctic Monkeys vs The Killers]
Nearest neighbors of : Lily Allen.
Nearest neighbors of : Miley Cyrus.
[Found more than one matching artist. Other candidates: Miley Cyrus攀, Demi Lovato Ft. Miley Cyrus Ft. Selena Gomez Ft. Jonas Brothers, Miley Cyrus and Billy Ray Cyrus, Miley Cyrus and John Travolta, Hannah Montana and Miley Cyrus]
| name | dot score | |
|---|---|---|
| 3259 | Coldplay | 3.674 |
| 37842 | Paramore | 3.603 |
| 6543 | Lady Gaga | 3.585 |
| 12363 | Muse | 3.576 |
| 30355 | Linkin Park | 3.527 |
| 36290 | Eminem | 3.519 |
| 17472 | The Killers | 3.513 |
| 8965 | Michael Jackson | 3.506 |
| 30355 | Linkin Park | 3.505 |
| 17832 | Green Day | 3.501 |
We believe these recommodations are good as when our model was given an artist in the top five, it actually recommended other artits in the top five.
Evaluation Code¶
This is the code needed to produce the in-depth model comparison. As we decided to use different notebooks for different models, the results of this code will be combined and explained later in the book.
## create holdout test set for each user (15 items)
user_artists = pd.read_csv('data/user_artists.dat', sep='\t')
user_ids = []
holdout_artits = []
for user_id in user_artists.userID.unique():
top_15_artists = user_artists[user_artists.userID == user_id].sort_values(by='weight').head(15).artistID.tolist()
if len(top_15_artists) == 15:
holdout_artits.append(top_15_artists)
user_ids.append(user_id)
holdout_df = pd.DataFrame(data={'userID':user_ids,'holdout_artists':holdout_artits})
holdout_df.to_csv('data/evaluation/test-set.csv',index=False)
## Finding the models vanilla, regualrised predection for each user.
def get_top_15_model_predictions(model, measure):
"""Computes the top 15 predictions for a given model
Args:
model: the name of the model
measure: a string specifying the similarity measure to be used. Can be
either DOT or COSINE.
Returns:
predicted_df a dataframe containing userIDs, their top 15 artists by the model, and the correspnding scores.
"""
artist_name_id_dict = dict(zip(artists['name'], artists['artistID']))
user_ids = []
predicted_artists = []
scores_list = []
for new_user_id, orginal_user_id in orginal_user_ids.items():
top_15_names = user_recommendations(model, new_user_id, k=15,measure=measure )['name'].values
top_15_scores = user_recommendations(model, new_user_id, k=15, measure=measure )['score'].values.tolist()
artist_ids = []
for name in top_15_names:
artist_ids.append(artist_name_id_dict[name])
predicted_artists.append(artist_ids)
user_ids.append(orginal_user_id)
scores_list.append(top_15_scores)
predicted_df = pd.DataFrame(data={'userID':user_ids,'predictions_artists':predicted_artists, 'score':scores_list })
return predicted_df
# save the recommended artits into dfs and save them to data/evaluation folder
vanilla_dot_pred= get_top_15_model_predictions(vanilla_model, measure=DOT)
vanilla_cos_pred = get_top_15_model_predictions(vanilla_model, measure=COSINE)
reg_dot_pred= get_top_15_model_predictions(reg_model, measure=DOT)
reg_cos_pred = get_top_15_model_predictions(reg_model, measure=COSINE)
vanilla_dot_pred.to_csv('data/evaluation/vannila_dot_pred.csv',index=False)
vanilla_cos_pred.to_csv('data/evaluation/vanila_cos_pred.csv',index=False)
reg_dot_pred.to_csv('data/evaluation/reg_dot_pred.csv',index=False)
reg_cos_pred.to_csv('data/evaluation/reg_cos_pred.csv',index=False)